An Evaluation of Layout Features for Information Extraction from Calls for Papers

نویسنده

  • Karl-Michael Schneider
چکیده

We describe a feature-rich conditional random field model for the extraction of conference and workshop information (e.g. name, date, location, deadline) from calls for papers (CFPs). This has applications in the automatic construction of a conference knowledge base from a collection of CFPs. Relevant information in CFPs is often contained in regions that do not contain complete, grammatical sentences, but can be distinguished visually from other parts of the text by their formatting. We show that in this situation layout features, i.e. features that measure physical layout properties of a text, improve extraction accuracy considerably. On a corpus of CFPs we observe a 30% gain in F1 through the use of layout features.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Performance Evaluation of Medical Image Retrieval Systems Based on a Systematic Review of the Current Literature

Background and Aim: Image, as a kind of information vehicle which can convey a large volume of information, is important especially in medicine field. Existence of different attributes of image features and various search algorithms in medical image retrieval systems and lack of an authority to evaluate the quality of retrieval systems, make a systematic review in medical image retrieval system...

متن کامل

A Ten-year Report of Drug and Poison Information Center in Mashhad, Iran 2007-2017

The Mashhad drug and poison information center (MDPIC) was officially established in 2000 to provide up-to-date information on medications. The objective of this study is to provide an epidemiologic profile of drug inquiry and poisoning-related phone calls to MDPIC from 2007 to 2017. This article is a descriptive retrospective study in which all inquiries about drugs and poisoning cases receive...

متن کامل

A Ten-year Report of Drug and Poison Information Center in Mashhad, Iran 2007-2017

The Mashhad drug and poison information center (MDPIC) was officially established in 2000 to provide up-to-date information on medications. The objective of this study is to provide an epidemiologic profile of drug inquiry and poisoning-related phone calls to MDPIC from 2007 to 2017. This article is a descriptive retrospective study in which all inquiries about drugs and poisoning cases receive...

متن کامل

Model-Guided Segmentation and Layout Labelling of Document Images Using a Hierarchical Conditional Random Field

We present a model-guided segmentation and document layout extraction scheme based on hierarchical Conditional Random Fields (CRFs, hereafter). Common methods to classify a pixel of a document image into classes text, background and image are often noisy, and error-prone, often requiring post-processing through heuristic methods. The input to the system is a pixel-wise classification based on t...

متن کامل

Evaluation of Updating Methods in Building Blocks Dataset

With the increasing use of spatial data in daily life, the production of this data from diverse information sources with different precision and scales has grown widely. Generating new data requires a great deal of time and money. Therefore, one solution is to reduce costs is to update the old data at different scales using new data (produced on a similar scale). One approach to updating data i...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005